11 research outputs found
Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding
Learning long-term dependencies in extended temporal sequences requires
credit assignment to events far back in the past. The most common method for
training recurrent neural networks, back-propagation through time (BPTT),
requires credit information to be propagated backwards through every single
step of the forward computation, potentially over thousands or millions of time
steps. This becomes computationally expensive or even infeasible when used with
long sequences. Importantly, biological brains are unlikely to perform such
detailed reverse replay over very long sequences of internal states (consider
days, months, or years.) However, humans are often reminded of past memories or
mental states which are associated with the current mental state. We consider
the hypothesis that such memory associations between past and present could be
used for credit assignment through arbitrarily long sequences, propagating the
credit assigned to the current state to the associated past state. Based on
this principle, we study a novel algorithm which only back-propagates through a
few of these temporal skip connections, realized by a learned attention
mechanism that associates current states with relevant past states. We
demonstrate in experiments that our method matches or outperforms regular BPTT
and truncated BPTT in tasks involving particularly long-term dependencies, but
without requiring the biologically implausible backward replay through the
whole history of states. Additionally, we demonstrate that the proposed method
transfers to longer sequences significantly better than LSTMs trained with BPTT
and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201
Deep Complex Networks
At present, the vast majority of building blocks, techniques, and
architectures for deep learning are based on real-valued operations and
representations. However, recent work on recurrent neural networks and older
fundamental theoretical analysis suggests that complex numbers could have a
richer representational capacity and could also facilitate noise-robust memory
retrieval mechanisms. Despite their attractive properties and potential for
opening up entirely new neural architectures, complex-valued deep neural
networks have been marginalized due to the absence of the building blocks
required to design such models. In this work, we provide the key atomic
components for complex-valued deep neural networks and apply them to
convolutional feed-forward networks and convolutional LSTMs. More precisely, we
rely on complex convolutions and present algorithms for complex
batch-normalization, complex weight initialization strategies for
complex-valued neural nets and we use them in experiments with end-to-end
training schemes. We demonstrate that such complex-valued models are
competitive with their real-valued counterparts. We test deep complex models on
several computer vision tasks, on music transcription using the MusicNet
dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve
state-of-the-art performance on these audio-related tasks
Applications of complex numbers to deep neural networks
Dans la dernière décennie, une heureuse confluence de matériel, de logiciels et de théorie ont permis à l'intelligence artificielle de connaître un renouveau: un "printemps" et qui, contrairement au passé, semblent avoir mené non pas à la déception d'un autre "hiver", mais à un "été" durable, rempli de réelles avances.
Une de ces récentes avances est l'entrée en scène de l'apprentissage véritablement ``profond''. Dans maintes applications les architectes de réseaux de neurones ont connu du succès en les approfondissant, et plus personne ne doute de l'utilité de représentations profondes, composées, hiérarchiques, apprises automatiquement à base d'exemples.
Mais il existe d'autres avenues, moins explorées, qui pourraient être utiles, comme l'emploi d'alternatives au système numérique le plus commun, les nombres réels: nombres à basse précision, nombres complexe, quaternions. En 2017, moi-même et l'un de mes principaux collaborateurs discutâmes du manque d'intérêt accordé au traitement en nombres complexes et à l'analyse de signaux complexes ou aisément convertis en une série de nombres complexes grâce à la transformée de Fourier (1D, 2D, à court terme ou non). Puisque ce secteur semblait peu exploré, nous nous y sommes lancés et, au terme d'une année passée à relever des défis propres à l'architecture et l'initialisation d'un réseau de neurones n'employant que des nombres complexes, nous avons débouché sur des résultats prometteurs en vision informatique et en traitement de musique. Nous déjouons aussi les pièges d'une initialisation et d'une normalisation naïve de ce type de réseau de neurones avec des procédures adaptées.In the past decade, a convergence of hardware, software and theory have allowed artificial intelligence to experience a renewal: a "spring" that, unlike previous times, seems to have led not to a burst hype bubble and a new "AI winter", but to a lasting "summer", anchored by tangible advances in the field.
One of the key such advances is truly ``deep'' learning. In many applications, the architects of neural networks have had great success by deepening them, and there is now little doubt about the value of deep, composable, hierarchical, automatically-learned-from-examples representations.
But there exist other, less-well-explored avenues for research, such as alternatives to the real-valued number system most commonly used: low-precision, complex, quaternions. In 2017, myself and one of my primary collaborators discussed the seeming lack of interest given to purely complex-valued processing of digital signals, either directly available in complex form or convertible to such using e.g. the Fourier Transform (1D, 2D, short-time or not). Since this area seemed under-explored, we threw ourselves into it and, after a year spent dealing with the challenges of neural networks with purely complex-valued internal representations, we obtained good results in computer vision and music spectrum prediction. We also expose the pitfalls of naively initializing and normalizing such complex-valued networks and solve them with custom formulations adapted for the use of complex numbers